# Set up libraries
# ...

library(tidyverse)
library(plotly)
library(Rtsne)
library(umap)
library(ggplot2)

In this exercise sheet, we want to compare different non-linear dimension reduction techniques. We will work with a 3D point cloud that we project into 2D.

To begin with, we need to load the data. In the zip archive, you can find the file point_cloud.csv. Read the csv file of the point cloud.
The point cloud is quite large to work with. Reduce it by keeping only every n-th point of it. A value of around n=400 might result in an appropriate performance in the subsequent tasks, depending on your computer specifications.

# Load the csv file and select a subset of the rows
# ...
# Loaded necessary libraries


# Set the value of n for downsampling
n <- 400

# Read the CSV file
point_cloud <- read_csv("point_cloud.csv")

# Reduce the point cloud by keeping every n-th point
reduced_point_cloud <- point_cloud %>%
  slice(seq(1, nrow(point_cloud), n))

Visualize the point cloud in 3D. You may use the plotly library to do so. After inspecting the point cloud in 3D, discuss what it might depict.

# Plot the point cloud in 3D
# ...
# Install and loaded required packages

library(plotly)

# Create 3D plot
plot_ly(reduced_point_cloud, x = ~x, y = ~y, z = ~z, mode = "markers") %>%
  add_markers()

What might the point cloud depict?
Answer: Based on the spatial arrangement of points, it is highly likely that the point cloud represents an elephant or an object with a strikingly similar silhouette.

Use t-SNE to project the point cloud into 2D. You may use the Rtsne package to do so. Try different values for the perplexity parameter of t-SNE and plot the different results. Based on your results, discuss which perplexity value seems to preserve the original structure the best.

# Perform t-SNE and plot the results
# ...
# Loaded necessary library

# Step : Perform t-SNE with different perplexity values and store the results
perplexity_values <- c(5, 10, 20, 30, 50)
tSNE_results <- list()

for (perplexity in perplexity_values) {
  tSNE_results[[as.character(perplexity)]] <- Rtsne(reduced_point_cloud, perplexity = perplexity, dims = 2)
}

# Step : Plot the different results using plotly
plots <- list()

for (perplexity in perplexity_values) {
  tsne_result <- tSNE_results[[as.character(perplexity)]]
  plot <- plot_ly(x = tsne_result$Y[, 1], y = tsne_result$Y[, 2], type = "scatter", mode = "markers",
                  marker = list(size = 8),
                  name = paste0("Perplexity ", perplexity))
  
  plots[[as.character(perplexity)]] <- plot
}

# Combine the plots into one
final_plot <- subplot(plots)

# Display the final plot
final_plot
−50050100−80−60−40−20020406080−50050−60−40−200204060−50050−40−200204060−500−40−200204060−40−2002040−50−40−30−20−10010203040
Perplexity 5Perplexity 10Perplexity 20Perplexity 30Perplexity 50

Which value for the perplexity works best to preserve the original structure?
Answer: As we know in generally, low perplexity values (e.g., 5-30) tend to emphasize the preservation of local structure, making them suitable for datasets with many densely packed clusters or regions with intricate local relationships. On the other hand, higher perplexity values (e.g., 50-100) emphasize the preservation of global structure and are better suited for datasets with well-separated clusters and broader patterns.

In our case i believe that perplexity of 50 is better at preserving the orginal structure in 2D

Now, use UMAP to perform the projection. You may refer to the package umap to do so. Experiment with the parameters n_neighbors and min_dist and plot the different results. Discuss which combination of parameter values works best to preserve the original structure.

# Perform UMAP and plot the results
# ...

# Perform UMAP with different parameter values and store the results
n_neighbors_values <- c(5, 10, 20, 30)
min_dist_values <- c(0.1, 0.2, 0.5, 0.7)
umap_results <- list()

for (n_neighbors in n_neighbors_values) {
  for (min_dist in min_dist_values) {
    key <- paste0("n_neighbors=", n_neighbors, "_min_dist=", min_dist)
    umap_results[[key]] <- umap(reduced_point_cloud, n_neighbors = n_neighbors, min_dist = min_dist)
  }
}

# Plot the different results using plotly
plots_UMAP <- list()

for (i in 1:length(n_neighbors_values)) {
  for (j in 1:length(min_dist_values)) {
    key <- paste0("n_neighbors=", n_neighbors_values[i], "_min_dist=", min_dist_values[j])
    umap_result <- umap_results[[key]]
    plot_UMAP <- plot_ly(x = umap_result$layout[, 1], y = umap_result$layout[, 2], type = "scatter", mode = "markers",
                    marker = list(size = 3),
                    name = key)
    
    plots_UMAP[[key]] <- plot_UMAP
  }
}

# Combine the plots into one
final_plot_UMAP <- subplot(plots_UMAP)

# Display the final plot
final_plot_UMAP
−50050−100102030405060−200−1000100200−100102030405060−200−1000100200−20−100102030405060−200−1000100200−20−100102030405060−100−50050100−10−50510−100−50050100−10−50510−100−50050100−15−10−50510−100−50050100−10−5051015−50050−15−10−50510−50050−15−10−50510−50050−15−10−50510−100−50050100−15−10−5051015−50050−10−5051015−50050−15−10−50510−50050−15−10−50510−20020−20−15−10−5051015
n_neighbors=5_min_dist=0.1n_neighbors=5_min_dist=0.2n_neighbors=5_min_dist=0.5n_neighbors=5_min_dist=0.7n_neighbors=10_min_dist=0.1n_neighbors=10_min_dist=0.2n_neighbors=10_min_dist=0.5n_neighbors=10_min_dist=0.7n_neighbors=20_min_dist=0.1n_neighbors=20_min_dist=0.2n_neighbors=20_min_dist=0.5n_neighbors=20_min_dist=0.7n_neighbors=30_min_dist=0.1n_neighbors=30_min_dist=0.2n_neighbors=30_min_dist=0.5n_neighbors=30_min_dist=0.7
# Step : Plot the specific result for 'n_neighbors = 30' and 'min_dist = 0.7' using plotly
specific_key <- "n_neighbors=30_min_dist=0.7"
specific_umap_result <- umap_results[[specific_key]]
specific_plot_UMAP <- plot_ly(x = specific_umap_result$layout[, 1], y = specific_umap_result$layout[, 2], type = "scatter", mode = "markers",
                              marker = list(size = 5),
                              name = specific_key)

# Customize the layout of the specific plot (optional)
specific_plot_UMAP <- layout(specific_plot_UMAP, title = "UMAP Plot for n_neighbors = 30, min_dist = 0.7")

# Display the specific plot
specific_plot_UMAP
−15−10−50510−20−15−10−5051015
UMAP Plot for n_neighbors = 30, min_dist = 0.7

Which combination of values for n_neighbors and min_dist works best to preserve the original structure?
Answer: We know that generally small n-nighbors and min-distance gives us good result of local structure, where as higher value works well of global structure. In our case i feel values for n neighbor 30 and min distance 7 works good for preserving the original structure

Lastly, compare your results from using t-SNE and UMAP. Which method worked best to preserve the original structure of the point cloud?
Answer:I believe each and every one of this method have their own boon and bane, but t-SNE works good when we decrease dimensions. But in my case UMAP have preserved the original structure most too.